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DESCRIPTION 

Method and System for Document or Content Off-Loading to a 
Document Repository 

Background of the Invention 

The invention relates to data processing environments with large 
document repositories and more specifically to a method and 
system for off-loading a document's content from a document 
processing system to a remote repository. 

Known client mailing applications like Lotus" 1 Notes™ or 
Microsoft™ Outlook™ contain continuously growing document 
repositories, namely the incoming and outgoing notes or emails 
often including large attachments like text documents, graphics 
or even storage consuming digitized pictures. Therefore, e.g. a 
Lotus Notes application uses a Lotus Domino™ database from which 
a tool like IBM Content Manager CommonStore™ for Lotus Domino 
(CSLD) is used to move documents stored in that database to an 
archive physically located on a different device like a tape 
storage . CSLD thereupon allows to access documents that have 
previously been archived. 

CSLD also allows to access documents that have been archived 
from any archive client application (e.g. scanning applications, 
CommonStore for SAP™, etc). When documents are retrieved from 
the archive to a Notes database, a Lotus Notes document is 
created* 

In most scenarios, such documents are viewed only once with a 
Notes internal or external viewer, and then become obsolete. 
However, such temporary retrieval documents waste resources and 
have impact on the overall performance of the Notes application. 
Therefore, users have to delete these documents. But since the 
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main interest of a user is to view an archived document, there 
is actually no need to retrieve a document to Lotus Notes. 

It is therefore an object of the present invention to provide a 
method and system for handling content off-loading from a 
document processing system to a large repository which is less 
resource consuming than the prior art approaches. 

Another object is to provide such a method and system which 
allow to retrieve off-loaded content, minimally wasting 
resources . 

It is yet another object to provide such a method and system 
which enables viewing of off-loaded content in a user- friendly 
way. 

The above objects are achieved by the features of the 
independent claims* Advantageous embodiments are subject matter 
of the subclaims. 

The idea underlying the invention is to provide a URL link to 
off-loaded content and to enable to display the content in a 
viewing application. In particular, it is proposed to detach the 
content from a document r to transfer it to a remote repository, 
and to replace it by a placeholder text implemented as a URL 
link. The text can contain information e.g, about who off-loaded 
or archived the document/content, the time/day of off-loading, 
and the original attachment filename, in order to identify the 
off-loaded content. The URL link, for instance, is a Notes URL 
link hotspot richtext element. That is, when clicked, a browser 
is opened, displaying the content associated with the URL link. 

That solution is less resource consuming than the prior art 




approaches/ particularly regarding storage capacity arid network 
traffic. It can advantageously be used e.g. in mail clients 
where mail documents, content or attachments are archived to a 
remote repository server, and can be viewed directly without 
physically transferring them to the mail client. 

It is understood hereby that the above mentioned remote 
repository server can also be a local hard disk. 

The invention can be applied to every known mail client program 
or system and anables worldwide viewing of a document. 
Preferably, archived documents can be viewed within the mail 
client via the URL links, e«g, by using a common web browser 
either as a plug-in to the mail client or a separate web browser 
that is automatically started when the URL is clicked. 

Preferably, an underlying mail server, e.g. Domino server, is 
connected to a web dispatcher component, which is basically a 
stripped-down web server with special archive-related 
functionality. The web dispatcher provides web access to an 
archived content. Hereby, requests to be processed by the web 
dispatcher are sent as HTTP requests with a defined parameter 
set. 

in another embodiment, when a document is off -loaded, it is 
assigned a unique identifier (ID) . The ID becomes part of the 
URL and can be encrypted for security reasons. 

In another embodiment, the document's type or the document's 
content type is stored with a document when the document is off- 
loaded. Hereby, the aforementioned web dispatcher can maintain a 
mapping table mapping content types to MIME types. This allows 
the browser to interpret each file correctly. 

In yet another embodiment, the aforementioned browser viewing is 
performed from within a search hitlist, i.e. when a search over 




the repository returns a hitlist document. For every hit in the 
hit list, a button and an URL link hot spot are displayed. When 
the button is pushed, Che corresponding content is retrieved. 
When the URL link is pushed/ the content is viewed in a plug- in 
or separate web browser. This allows to quickly view content to 
find out whether it is the desired one. Then, if necessary, it 
can be retrieved back to the mail client. 

it is noteworthy that, instead of using a web browser for 
viewing an off-loaded content, every kind of HTTP client tool 
can be used. 

Brief Description of the Drawings 

In the following, the present invention is described in more 
detail by way of embodiments from which further features and 
advantages of the invention become evident whereby 

Fig, 1 is an overview block diagram showing a document before 
and after content off-load according to the invention; 

Fig. 2 is a flow diagram illustrating basic components and 
data flow of a preferred embodiment of the invention; 

Fig. 3A is a diagram showing various steps of a content off- 
load procedure according to the invention; 

Fig. 3B is another diagram showing various steps of content 
retrieval according to the invention; and 

Fig. 3C is another diagram another embodiment of content 
retrieval via a search over a repository. 

Petted Description of thq BESttiaflS 

Fig. 1 illustrates the basic concept of the invention by showing 



Printed:31 -05-2001 



DE9-2000-0 

- 5 - 

a document before and after content off-load according to the 
invention. A mail client 101 that has stored a number of email 
documents 102 - 104 (docl, .."•>, each of them containing content 
105 (XYZ) , and possibly one or more attachments 106. 

During an off-loading procedure, the content 105 and the 
possible attachments 106 are detached from the document 102 and 
transf erred 107 to a remote repository server 108. In the 
original document 102, after the off-loading 107 , the content 
105 is replaced by a placeholder text 109, i.e. a Lotus Notes 
URL link hotspot richtext element in the present example. 
Possible attachments 106 are replaced by a corresponding URL. 

The block diagram depicted in Fig., 2 shows a Lotus Notes 
environment as an example of a document processing system. The 
system is shown in a state after an already performed off- 
loading procedure. It comprises a Notes database 201 (Notes DB) 
for which an exemplary eMail document 202, where the document 
content were replaced by a URL link hotspot richtext element 203 
as discussed beforehand. The URL text contains information about 
who archived the document, time/day of archiving, and the 
original attachment filename. For example 

«< Attachment 'CSLDCIienlsvs' has been archived by user 
'Daniel Haente/Germanv/IBM' on '09/20/2000 1 1 :32:18 AM'. »> 

When the url link 203 is clicked, a browser 204 (in the present example a 
Netscape web browser) is started, connecting to the given URL 203. When a 
document is off-loaded by CSLD 206, it is assigned a unique identifier (ID) . 
The ID is encrypted for security reasons and becomes part of the URL 203. An 
HTTP „GET* request together with the IB is sent from the web browser to an 
HTTP dispatcher 205 which is a stripped-down HTTP server. The goal of the 
HTTP dispatcher 205 is to provide web access to archive content. Requests to 
be processed by the HTTP dispatcher 205 are sent as HTTP requests with a 
defined parameter set. 

The CSLD HTTP dispatcher 205 extracts the encrypted ID from the URL 203, 
decrypts it, retrieves the content from the repository, and sends it to the 




browser, where it is displayed. Of course, for some document types a special 
browser plugin is required. 

The HTTP dispatcher 205 forwards the request to IBM Content Manager 
Commons tore" 1 for Lotus Domino 206 (CSLD) and requests the content having the 
sent ID, referred to in the URL 203. The CSLD 206, in particular, provides an 
interface to one or more document repositories 207 - 209. The repository or 
repositories , in the present embodiment, is (are) comprised of Tivoli"" 1 
Storage Manager ™ 207 (TSM), Content Manager™ 208 and Content Manager" 
OnDemand™ 209. Bach of these three components 207 - 209 can be connected to 
one or more tape storage devices 210 - 212. TSM 207 retrieves the content 
requested by the HTTP „GET* request and returns it to CSLD 206. Finally, the 
retrieved content can be viewed using the Netscape browser 204. 

A complete ORL 203 computed by CSLD 206 during off-loading consists of the IP 
address or host name running the HTTP dispatcher 205, the HTTP dispatcher 
port, the internal command sG&t and an encrypted document ID. An example is 

htto://ix>j3ken.boebl8nQen.cteJt>m.cqm;B08S/?8Qat&PlieTH1W 
Xw1 i ABdcA)F5XBJoYn BHCHRhf X9nG2 VrnYXd% 2Ba1 J 
XAgJ5XBJXTkRRaOFuCEBDUUBdEqAAeQRtMfA4L2M 
VMPBNMToM 

When a document is off-loaded by CSLD 206, the document's 
content type is stored with the document. The HTTP dispatcher 
205 maintains a {not shown) table mapping content types to MIME 
types. This allows the web browser 204 to interpret the file 
correctly. 

it is noted that the browser viewing feature has nothing to do 
with Notes except that the URL link 203 is kept in a mail 
document 202. Therefore, no temporary Notes retrieval documents 
are created. 

Fig, 3A is a diagram showing various steps of a content off -load 
procedure according to the invention illustrated for attachment 
archiving. A user starts 301 the off-load procedure by e.g. 
pushing an 'archive 1 button in the Notes client. Alternatively, 
the procedure can be triggered automatically 303. The attachment 
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is detached 302 by CSLD and moved 304 to a repository. 
Afterwards, CSLD replaces 305 the attachment (s) by a URL link. 

Fig. 3B shows the scenario for a single content retrieval in 
case of an off-loaded attachment. The user initiates 311 
retrieval by clicking the URL link which opens 312 a web 
browser. The web browser sends 313 an HTTP a QET M request to the 
server designated in the URL* The HTTP server retrieves 314 the 
attachment from the repository via CSLD. The content is sent 
back 315 as an HTTP response to the web browser. Finally, the 
browser displays 316 the attachment. 

Fig. 3C shows the scenario for retrieving an attachment via a 
search over the repository. A user initiates 321 a search in the 
repository. CSLD perforins 322 that search and returns the result 
as a Notes hitlist document. From that hitlist the user can 
click 323 on a URL representing a certain hit. This opens 324 a 
web browser. The web browser sends 325 an HTTP "GET* request to 
the server specified in the URL. The HTTP server retrieves 326 
the attachment from the repository via CSLD. The content is sent 
back 327 as an HTTP response to the web browser. Finally, the 
browser displays 328 the attachment. 

It should be noted that the above described browser viewing 
shall not be confused with the browser viewing feature in a 
Domino web client. With a Notes web client, Notes databases are 
accessed from within a browser. With CSLD browser viewing, 
content in an archive is viewed in a browser without retrieving 
the content to Lotus Notes. Browser viewing also works with the 
Notes web client. That is, it makes no difference whether a 
document URL link is clicked in a document being viewed in the 
Notes client or in a document being viewed in a Domino web 
client, in both cases, no Lotus Notes document is created. 

CSLD browser viewing also allows users to forward an URL link to 
other users, even to those who have no Notes client installed. 




All these users will be able to view the document in a browser. 
A further application of CSLD browser viewing is viewing of 
archived documents for which no Notes viewer exists, but which 
are supported by a browser (native or via plugin) . 
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CL A I M S 

1. A method for handling content off-loading from a document 
processing system to a document repository, comprising the 
steps of: 

Detaching content from the document; 

transferring the detached content to the document 
repository; 

replacing the content by a URL link placeholder, 

2. Method according to claim 1, wherein the content is the 
whole document or at least part of the document. 

3. Method according to claim 1 or 2, wherein the URL link 
placeholder contains additional information identifying the 
off-loaded content, in particular information about the 
user who off-loaded the document /content and/ or the 
time/day of off-loading and/or an original document /content 
designation. 

4. Method according to any of claims 1 to 3, wherein the URL 
link placeholder is a Notes URL link hotspot richtext 
element . 

5. Method according to any of the preceding claims, wherein 
viewing the detached content at the client via the URL 
link. 

6. Method according to claim 5, wherein using a web browser 
for viewing the detached content* 

7. Method according to any of the preceding claims, wherein 
providing a web dispatcher component being a stripped -down 



- 10 - 



DE9-2000-011 



web server that provides web access to off-loaded content. 

8. Method according to claim 7, wherein access requests to be 
processed by the web dispatcher are sent as HTTP requests 
with a defined parameter set. 

9. Method according to any of the preceding claims, wherein 
assigning a unique identifier for off-loaded content. 

10. Method according to claim 9, wherein the unique identifier 
is part of the URL and/or is encrypted. 

11. Method according to any of the preceding claims, wherein a 
document's content type is stored with the content when it 
is off-loaded. 

12 . Method according to claim 11 insofar as referring to any of 
claims 7 to 10, wherein the web dispatcher maintains a 
table mapping content types to MIME types. 

13. Method according to any of the preceding claims, comprising 
the steps of: 

performing a search over a repository containing off-loaded 
content ; 

returning a hitlist document; 

for every hit in the hitlist document, displaying a button 
to retrieve the content associated with a hit and/or a URL 
link to view the content associated with a hit. 

14. A system for handling content off-loading from a document 
processing system to a document repository, comprising: 

Means for detaching content from the. document; 
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means for transferring the detached content to the document 
repository; 



means for replacing the content by a URL link placeholder. 

15. System according to claim 14, where the client comprises a 
web browser or HTTP client tool for viewing the detached 
content . 

16 « System according to claim 14 or 15 r comprising a web 

dispatcher component being a stripped-down web server that 
provides web access to off-loaded content. 

17* System according to any of claims 14 to 16 f comprising 
means for assigning a unique identifier to off-loaded 
content . 

18. System according to any of claims 14 to 17, comprising 
means for encrypting the unique identifier as part of the 
URL. 

19. System according to any of claims 16 to 18, where the web 
dispatcher maintains a table mapping content types to MIME 
types . 

20. A data processing program for execution in a data 
processing system comprising software code portions for 
performing a method according to any of claims 1 to 13 when 
said program is run on said computer. 

21. A computer program product stored on a computer usable 
medium, comprising computer readable program means for 
causing a computer to perform a method according to any of 
claims 1 to 13 when said program is run on said computer. 
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ABSTRACT 



Disclosed are a method and system for handling document or 
content off-loading from a document processing system to a large 
repository. The content of a document or any attachment attached 
to the document are detached and physically transferred to a 
remote repository server and replaced by a placeholder text. The 
text contains information e.g. about who archived the document, 
the time/day of archiving, and the original attachment filename* 
In particular, the placeholder text itself is a URL link, for 
instance a Notes URL link hotspot richtext element. That is, 
when clicked, a browser is opened, displaying the URL associated 
with the URL link. 

That solution is less resource consuming than the prior art 
approaches/ It can advantageously be used in mail clients where 
mail documents, content or attachments are archived to a remote 
repository server and can be viewed directly without physically 
transferring them. (Fig. 1) 
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